How Xenopus laevis embryos replicate reliably: investigating the random-completion 

problem 



Scott Cheng-Hsin Yang* and John Bechhoefer 
Department of Physics, Simon Fraser University, Burnaby, B.C., V5A 1S6, Canada 

(Dated: June 20, 2008) 

DNA synthesis in Xenopus frog embryos initiates stochastically in time at many sites (origins) 
along the chromosome. Stochastic initiation implies fluctuations in the time to complete and may 
lead to cell death if replication takes longer than the cell cycle time (« 25 min). Surprisingly, 
although the typical replication time is about 20 min, in vivo experiments show that replication 
fails to complete only about 1 in 300 times. How is replication timing accurately controlled despite 
the stochasticity? Biologists have proposed two solutions to this "random-completion problem." 
The first solution uses randomly located origins but increases their rate of initiation as S phase 
proceeds, while the second uses regularly spaced origins. In this paper, we investigate the random- 
completion problem using a type of model first developed to describe the kinetics of first-order phase 
transitions. Using methods from the field of extreme-value statistics, we derive the distribution of 
replication-completion times for a finite genome. We then argue that the biologists' first solution to 
the problem is not only consistent with experiment but also nearly optimizes the use of replicative 
proteins. We also show that spatial regularity in origin placement does not alter significantly the 
distribution of replication times and, thus, is not needed for the control of replication timing. 

PACS numbers: 87.15.A-, 87.14.G-, 87.17.Ee, 87.15.Ya 



I. INTRODUCTION 

DNA replication is an important yet complicated pro- 
cess that requires not only accurate and efficient DNA 
synthesis but also genome-wide coordination among 
replicative proteins [1]. In a time that can be as short as 
a few minutes, all of a cell's O(10 9 ) bases of DNA must 
be replicated once and only once [2, 3]. Unfaithful and 
uncontrolled replication of the genome — for example, 
mis-replication, partial replication, and re-replication — 
can lead to chromosomal instability that activates pro- 
grammed cell death or oncogenes [4, 5]. Over the past 
few decades, significant advances have been made in 
identifying the molecular basis of DNA repair and re- 
replication prevention [3, 6]. On the other hand, it is 
only in the last few years that large amounts of data 
on the genome- wide coordination have become available. 
In particular, a technique called molecular combing has 
been used to examine the replication state of large frac- 
tions of the genome by controlled stretching of fluores- 
cently labeled replicated and unreplicated regions onto a 
substrate [7, 8]. 

Many of the molecular-combing experiments have been 
done on embryos of the South African clawed frog, Xeno- 
pus laevis [9, 10, 11]. The detailed kinetics of replica- 
tion revealed a particularly interesting scenario where 
stochastic effects play an important role in the DNA 
replication process [9, 12]. In previous work, we mapped 
the stochastic replication process onto a one-dimensional 
nucleation-and-growth process and modeled the detailed 
kinetics of replication seen in molecular-combing exper- 
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iments [11, 13, 14]. In a recent letter, we extended the 
model to quantitatively address a generalized version of 
the "random-completion problem," which asks how cells 
can accurately control the replication completion time 
despite the stochasticity [ ]. Here, we give full details 
about those calculations and go further, to investigate the 
idea that cells regulate the replication process in order to 
minimize their use of cell "resources" and to explore the 
effects of spatial regularity on the placement of origins. 



A. DNA replication in eukaryotic cells 

DNA replication is a two-step process [3]. First, po- 
tential origins — sites where DNA synthesis may start 
- are "licensed" across the genome. For somatic cells, 
licensing occurs in the Gl phase of the cell cycle; for 
embryos, whose abbreviated cell cycles lack the Gl and 
G2 phases, this occurs late in the mitosis (M) phase. 
The process of licensing involves the formation of pre- 
replicative complexes (pre-RC) of proteins. Each com- 
plex is first formed through the binding of a single group 
of six proteins, known as the origin recognition complex 
(ORC), to the DNA. Each ORC, with the help of two 
additional proteins (Cdc6 and Cdtl), then recruits 20- 
40 copies of Mini Chromosome Maintenance (MCM) 2-7 
hexamer rings onto the chromosome [3]. After licensing, 
the second step, DNA synthesis, starts in the synthe- 
sis (S) phase. The synthesis begins with the initiation 
of a potential origin — two of the MCM 2-7 rings - 
triggered by the association of cyclin-dependent kinases 
[3]. Once an origin is initiated, the pre-RC disassembles, 
and two helicases, probably the MCM 2-7 rings, move 
bi-directionally outward from the origin to unwind the 
double-stranded DNA, forming two symmetrically prop- 
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agating replication forks. Polymerases are recruited be- 
hind the forks to synthesize DNA on the single-stranded 
DNA. When two replication forks traveling in opposite 
directions meet, the helicases disassemble, and the two 
growing strands of newly synthesized DNA are joined to- 
gether by DNA ligases. This process is referred to as a 
coalescence. In eukaryotic cells, the processes of origin 
initiation, fork progression, and domain coalescence take 
place at multiple sites throughout S phase until the whole 
genome is replicated. Re-replication is prevented because 
pre-RCs are "non-recyclable" in S phase. When poten- 
tial origins initiate or are passively replicated by other 
replication forks, pre-RCs disassemble and arc inhibited 
from reassembling on the DNA throughout the current S 
phase, thereby preventing re- initiation and re-replication 
[3]. 



B. The random-completion problem 

Replication in Xenopus embryos is interesting because 
the process is stochastic yet the replication completion 
times are tightly controlled. After fertilization, a Xeno- 
pus embryo undergoes 12 rounds of synchronous, unin- 
terrupted, and abbreviated cell cycles (lacking Gl and 
G2 phases), whose durations are strictly controlled by 
biochemical processes that are independent of replica- 
tion [4, 16]. In contrast to the case of most somatic cells, 
these embryonic cells lack an efficient S/M checkpoint 
to delay entrance into mitosis for unusually slow repli- 
cation [ ]. Nonetheless, in each embryonic cell cycle, 
roughly 3 billion basepairs of DNA are replicated in a 
20 min S phase followed by a 5 min mitosis (M) phase 
at 23°C [18, 19]. If replication is not completed before 
the end of mitosis, the cell suffers a "mitotic catastro- 
phe" where the chromosomes break, eventually leading 
to cell death [4, 20, 21]. (See Sec. Ill A for more discus- 
sion.) In replicating the lengthy genome, O(10 6 ) poten- 
tial origins are licensed, without sequence specificity, and 
initiated stochastically throughout S phase [11, 12]. One 
might expect that this spatiotemporal stochasticity leads 
to large fluctuations in replication times, which would 
result in frequent mitotic catastrophes. However, exper- 
iments imply that such catastrophic events for Xenopus 
embryos happen only once in about 300 instances (see 
Sec. Ill A). This means that despite the stochasticity in 
licensing and initiations, Xenopus embryos tightly con- 
trol the duration of S phase, in order to meet the 25 min 
"deadline" imposed by the cell-cycle duration. 

Laskey was the first to ask whether non-sequence- 
specific licensing might lead to incomplete replication 
[ ]. Specifically, he assumed that origins in embryonic 
cells initiate at the start of S phase. (This is now known 
not to be the case [11].) He then noted that if the origins 
were licensed at random, they would have an exponen- 
tial distribution of separations. With the estimates of 
the average inter-origin spacing and fork velocity known 
at that time, one would expect a few large gaps. The 



extra time needed to replicate the gaps would then im- 
ply a replication time larger than the known duration of 
S phase. Even though some details have changed, biolo- 
gists still have such a paradox in mind when they refer 
to the random-completion problem [18]. 

Over the years, biologists have proposed two qualita- 
tive scenarios to resolve the random-completion para- 
dox. The first scenario, the "regular-spacing model," 
incorporates mechanisms that regularize the placement 
of potential origins despite the non-sequence specificity 
to suppress fluctuations in the size of inter-origin gaps 
[18, 23]. The second scenario, the "origin-redundancy 
model," uses a large excess of potential origins that 
arc initiated with increasing probability throughout S 
phase to suppress the fluctuations produced by random 
licensing [16, 24]. Experimentally, the observed replica- 
tion kinetics in Xenopus are compatible with the origin- 
redundancy model, but there is also evidence for limited 
regularity in the origin spacings [11, 18, 25]. 

In this paper, we shall reformulate the random- 
completion problem in a more general way and investi- 
gate both scenarios using a stochastic model and Monte 
Carlo simulations. We consider the case in which ori- 
gin initiation rates can be time dependent and non-zero 
throughout S phase. We then investigate how cells con- 
trol the total replication time despite the non-sequence- 
specific placement and stochastic initiation of potential 
origins. As we shall see, the fluctuations in the replica- 
tion times can be reduced arbitrarily if one allows an un- 
restricted number of initiations. As an extreme example, 
having an infinite number of initiations at time t* im- 
plies that replication would always finish at t* . Thus, an 
even more general formulation of the random-completion 
problem is to ask how reliability in timing control can be 
achieved with a reasonable or "optimal" use of resources 
in the cell. Of course, the terms "reasonable" , "optimal" , 
and "resources" must be carefully defined. 

In the following section, we review and extend the pre- 
viously developed model of replication to derive the dis- 
tribution of replication times [13, 14]. The results will 
show how replication timing can be controlled despite the 
stochasticity. In Sec. Ill, we use the extended model to 
extract replication parameters from in vivo and in vitro 
experiments. In Sec. IV, we compare the extracted in 
vivo "replication strategy" with the strategy that opti- 
mizes the consumption of replication forks. In Sec. V, 
we explore the effect of spatial ordering on the replica- 
tion time via a variant of the regular-spacing model. We 
summarize our findings in Sec. VI. 



II. MODELING REPLICATION COMPLETION 

In previous work, we developed a stochastic model 
of DNA replication [13, 14] that was inspired by the 
Kolmogorov-Johnson-Mehl-Avrami (KJMA) theory of 
phase-change kinetics [26, 27, 28, 29, 30, 31]. The 
KJMA model captures three aspects of phase transfer- 



mation: nucleation of the transformed phase, growth of 
the nucleated domains, and coalescence of impinging do- 
mains. Making a formal analogy between phase trans- 
formations and DNA replication, we map the kinetics 
of the DNA replication onto a one-dimensional KJMA 
model with three corresponding elements: initiation of 
potential origins, growth of replicated domains, and co- 
alescence of replicated domains. Note that our use of a 
phase-transformation model implicitly incorporates the 
observation that, ordinarily, re-replication is prevented. 

Since we neglect any stochasticity in the movement of 
replication forks, the stochastic element of the model lies 
entirely in the placement and initiation of origins [32]. 
The licensing and initiations can be viewed as a two- 
dimensional stochastic process with a spatial dimension 
whose range corresponds to the genome and a temporal 
dimension whose range corresponds to S phase. There 
is good evidence that the positions of the potential ori- 
gins in Xenopus embryos are almost — but not com- 
pletely — random [12, 18, 25]. In this section, we as- 
sume the spatial positions of the potential origins to be 
uniformly distributed across the genome for ease of cal- 
culation. We discuss the implications of origin regular- 
ity in Sec. V. The temporal program of stochastic ini- 
tiation times is governed by an initiation function I(t), 
defined as the rate of initiation per unreplicated length 
per time. In writing down the initiation rate as a simple 
function of time, we are implicitly averaging over any spa- 
tial variation and neglecting correlations in neighboring 
initiations. The I(t) deduced from a previously analyzed 
in vitro experiment on Xenopus implies that the initia- 
tion rate increases throughout S phase [11]. In order to 
explore analytically a family of initiation functions that 
includes such a form, we investigate the distribution of 
replication completion times associated with I(t) = I n t n , 
with /„ a constant. We also examine an alternative 5- 
function form, where all potential origins initiate at the 
start of S phase, as one might expect this to be the best 
scenario for accurate control of replication time. (In the 
early literature on DNA replication, biologists assumed 
this scenario to be true [22].) 

Figure 1 shows schematically the initiations and subse- 
quent development of replicated domains discussed ear- 
lier. After initiation, a replicated domain grows bidi- 
rectionally outward from the origin. The growth stops 
when domains meet and coalescence but proceeds else- 
where. Multiple domains grow and coalesce throughout 
S phase until the entire genome is duplicated. We shall 
assume, for simplicity, that the replication fork velocity 
is constant. Since variations in fork velocity have been 
observed, a constant velocity should be interpreted as 
averaging over the course of S phase [33, 34]. We dis- 
cuss the effect of varying fork velocities in more detail in 
Sec. IIIB. 

Our model results in a deterministic growth pattern 
once the initiations are set. Figure 1 illustrates such de- 
terministic growth and shows that, except at the edges, 
there is a one-to-one mapping between the initiations 
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FIG. 1: Schematic of the DNA replication model. A horizon- 
tal slice in the figure represents the state of the genome at a 
fixed time. The lighter (darker) gray represents unreplicated 
(replicated) regions. Open circles denote initiated origins, 
while filled circles denote coalescences. The dark dotted line 
cuts across the last coalescence, which marks the completion 
of replication. The slope of the lines connecting the adjacent 
open and filled circles gives the inverse of the fork velocity. 



and the coalescences. It follows that every distribution 
of initiations 4>i(t) determines an associated distribution 
of coalescences <fi c (t)- Since the completion of replica- 
tion is marked by the last coalescence, the problem of 
determining the time needed to replicate a genome of 
finite length is equivalent to that of determining the dis- 
tribution of times at which the last coalescence occurs. 
We refer to this distribution as the "end-time" distribu- 
tion (j) e {t). Below, we derive an analytical approxima- 
tion to the end-time distribution function for arbitrary 
I(t). This analytical result will allow us to investigate 
how licensing and initiation programs affect the timing 
of replication completion. 

In addition to analytic results, we also carried out ex- 
tensive numerical simulations of DNA replication. The 
simulation algorithm used is a modified version of the 
previously developed "phantom- nuclei algorithm" [13]. 
The phantom-nuclei algorithm includes three main rou- 
tines: the first determines the random-licensing positions 
and the origins' stochastic initiation times via Monte 
Carlo methods [35] ; the second implements the determin- 
istic growth; and the third eliminates passively replicated 
origins. Once potential origins are licensed, the algorithm 
can calculate the state of the genome at any time step 
without computing intermediate time steps. We modified 
our earlier code to generate end-time distributions using 
the bisection method to search for the first t at which the 
replication fraction / becomes 1 [3G]. All programming 
was done using Igor Pro v. 6.01 [37]. 



A. The end-time distribution 

In previous work, we showed that for an infinitely long 
genome, the fraction / of the genome that has replicated 
at time t is given by [13] 

f{t) = 1 - e~ 2vh ^ , (1) 
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where v is the fork velocity (assumed constant), h(t) = 
J Q g(t')dt' and g(t) = L I(t')dt'. Equation 1 predicts 
that an infinite time is needed to fully duplicate the 
genome; however, since all real genomes are finite in 
length, they can be fully replicated in a finite amount 
of time. During the course of replication, as long as the 
number of replicated domains is much greater than one, 
the infinite-genome model is reasonably accurate. How- 
ever, since the number of domains is small at the be- 
ginning and end of replication (/ — * and / — * 1), we 
expect discrepancies in those regimes. In particular, to 
calculate the finite replication time expected in a finite 
genome, we need to extend our previous model. 

We begin by introducing the hole distribution, 
rift (a;, t) = g 2 (t) exp[— g(t)x — 2vh(t)] which describes the 
number of "holes" of size x per unit length at time t [13]. 
A "hole" is the biologists' term for an unreplicated do- 
main surrounded by replicated domains. Since a coales- 
cence corresponds to a hole of zero length, we define the 
coalescence distribution <p c (t) oc n/ l (0,t). Normalizing by 
imposing the condition J <fi c (t)dt — 1, we find 

Mt) = ^We-^ , (2) 

where L is the genome length and N the expected total 
number of initiations. Note that N a is also the total num- 
ber of coalescences because of the one-to-one mapping 
discussed in the previous subsection. One can calculate 
N via 

/>oo />oo 

N = L I(t)[l-f{t)]dt = L I(t)e~ 2vh ^dt , (3) 
Jo Jo 

where the factor [1 — f(t)] arises because initiations can 
occur only in unreplicated regions. The integrand in 
Eq. 3 divided by N a is the initiation distribution (fii(t)dt, 
which corresponds to the number of initiations between 
time t and t + dt. 

Given the initiation distribution, we picture the ini- 
tiations as sampling N Q times from 4>i(t). This implies 
that N Q independent coalescence times are sampled from 
4> c (t). The replication completion time, finite on a finite 
genome, can then be associated with the largest value 
of the N Q coalescence times, and the end-time distribu- 
tion is the distribution of these largest values obtained 
from multiple sets of sampling from <p c (t). At this point, 
we apply extreme-value theory (EVT) to calculate the 
end-time distribution. EVT is a well-established statis- 
tical theory for determining the distributional proper- 
ties of the minimum and maximum values of a set of 
samples drawn from an underlying "parent" distribution 
[38, 39]. The properties of interest include the expected 
value, fluctuations around the mean, frequency of occur- 
rence, etc. EVT plays a key role in the insurance in- 
dustry, where, for example, the "100-year-flood" prob- 
lem asks for the expected maximum water level over 100 
years [40]. In physics, EVT has attracted increasing in- 
terest and been applied to analyze crack avalanches in 



self-organized material [41], degree distribution in scale- 
free networks [42], and many other problems. 

EVT is powerful because of its universality. The key 
theorem in EVT states that the distribution of the ex- 
tremes of an independent and identically distributed ran- 
dom variable tends to one of three types of extreme value 
distributions, the Gumbel, Frechet, and Weibull distri- 
butions, depending only on the shape of the tail of the 
underlying distribution. The universality of the extreme 
value distribution with respect to the underlying distri- 
bution is similar to that of the better-known Central 
Limit Theorem [ ]. For an underlying distribution with 
an unbounded tail that decays exponentially or faster, 
the distribution of the extremes tends to a Gumbel dis- 
tribution. Such is the case of Xenopus since the under- 
lying distribution, the coalescence distribution c (t), is 

_ 4 

approximately proportional to e T , where r is a dimen- 
sionless time [44, 45]. The other initiation functions we 
consider also lead to the Gumbel distribution. 
The Gumbel distribution, 

p{x) = i exp (-x - e~ x ) , x=t ^~Q —> ( 4 ) 

depends on only two parameters, t* and (3 [38, 39, 46]. 
The former is a "location" parameter that gives the mode 
of the distribution. The latter is a "scale" parameter pro- 
portional to the standard deviation. We follow standard 
procedures to obtain t* and (3 as a function of the initi- 
ation rate and the fork velocity [38, 46]. The main step 
is to recognize that the cumulative end-time distribution 
$ e (i), which has a Gumbel form, is equal to the product 
of N a cumulative coalescence distributions, each result- 
ing from the same initiation distribution <f>i(i). In other 
words, the probability that N coalescences occur at or 
before time t is equivalent to the probability that the 
last of them occurred at or before time t, which is also 
the probability that the replication will finish at or be- 
fore time t. For our case, we find that the mode t* is 
determined implicitly by 

AT [l-$ c (t*)] = l (5) 

and P « l/[N Q (j) c (t*)]. In Eq. 5, $ c (i) is the cumulative 
distribution of 4> c (t)', thus, [1 — $ c (i)] is the probabil- 
ity that a coalescence would occur at or after time t. 
Equation 5 then implies that given a total of N a coales- 
cences, t* is the time after which the expected number of 
coalescences is one, and therefore, the typical end-time. 
The Gumbel form of the end-time distribution is one of 
our main results, as it allows quantitative comparison be- 
tween the fluctuations of completion times resulting from 
different initiation functions. 

Below, we derive the end-time distribution for a power- 
law initiation function I n (t) — I n t n (where n > — 1) and 
a delta-function initiation function I$(t) = I$5(t). In 
the power-law case, h(t) oc i™ +2 , while for the 6- function 
case, h(t) oc t. From Eq. 2, both initiation forms give 
rise to coalescence distributions that decay exponentially 
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or faster, and thus, both forms will lead to an end-time 
distribution of the Gumbel form. Using these initiation 
functions, we see that the coalescence distribution given 
by Eq. 2 is completely determined by three parameters: 
the fork velocity v, the initiation strength given by the 
prefactor /„ or Is, and the initiation form determined by 
n or 5(t). The relationship between these three parame- 
ters and the two Gumbel parameters reveals how different 
"initiation strategies" affect the completion time. 

We write the cumulative distribution Q c (t) of the coa- 
lescences as 1 — f°° c/> c (t')dt' . Then, using integration by 
parts, we obtain 



L 



L 

X 



Substituting Eq. 6 into Eq. 
equation 

2vh(t*) =ln[(l -a)Lg(t*)], 



we obtain a transcendental 



J™ I(t)e- 2vh ^dt 
g ( t *) e -2vh(r) 



that relates the initiation parameters to f* 
width, Eqs. 2 and 7 give 



For 



(7) 
the 



(8) 



2vg(t*) ' 

indicating that the width of the end-time distribution, 
f3, is inversely proportional to g(t*), the typical number 
of potential origins per length. In practice, given ex- 
perimentally observed quantities such as v, f*, and L, we 
solve Eqs. 7 and 8 numerically to determine the initiation 
prefactor (Ig or /„) and the width for different initiation 
forms [6(t) or f™]. Nevertheless, an analytical approx- 
imation of Eqs. 7-8 is possible, as the factor a is often 
small. For instance, in the power-law I(t) case, introduce 
a function 77(f) = be~ at that decays more slowly than 
<j>i(t). Then, imposing rj(t*) = <f>i(t*) so that 17(f) > <j>i(t) 
for f > t* , we find a to be at most 0(1O -2 ). Neglecting 
a, we then obtain the analytical approximations 



(n + l)(n + 2) 



In 



L(n + 2) 
2vt* n+2 



(9) 



(10) 



2vt* n+2 
n + l 
2vl n t* n+1 

that show the explicit relationship between the initiation 
parameters and the Gumbel parameters. 

In summary, given a realistic initiation function /(f) 
and fork velocity v, we have shown that the distribution 
function of replication end-time tends toward a Gumbel 
form. We have also shown how the replication parame- 
ters relate to the location and scale Gumbel parameters 
analytically. 

B. Replication timing control 

As a first step toward understanding the solutions to 
the random-completion problem, we consider the end- 
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FIG. 2: (Color online) (a) The end-time distribution with 
fixed mode f* = 38 min. Markers are the results of the Monte 
Carlo simulations. Each distribution is estimated from 3,000 
end-times. The "5-function" corresponds to initiating all po- 
tential origins simultaneously at t = min. The n = 0, 
1, 2 cases correspond to constant, linearly increasing, and 
quadratically increasing initiation rates, respectively. Solid 
lines are Gumbel distributions with t* and (3 calculated ac- 
cording to Eqs. 7-8. There are no fit parameters, (b) Ini- 
tiation distribution 4>i(t) for n = 0, 1, 2. Parameter values 
correspond to those in (a). Error bars are smaller than marker 
size. Solid lines are calculated from Eq. 3. Again, there are 
no fit parameters. 



time distributions produced by different initiation func- 
tions. From these results and the theory developed, we 
infer two heuristic principles for controlling the end-time 
distribution: the first concerns the width, while the sec- 
ond concerns the mode. We first explore how the width 
(3 depends on the initiation form [6(t) and t n ] by simulat- 
ing the replication process while constraining the typical 
replication time and fork velocity to match the values in- 
ferred from in vitro experiments: f* = 38 min and v = 0.6 
kb/min. (As we discuss in Sec. Ill, replication in vitro is 
slower than in vivo.) The genome length L is 3.07 x 10 6 
kb throughout the paper [47]. The prefactors Ig and /„ 
are then calculated using Eq. 7. 

The result shown in Fig. 2(a) is perhaps counter- 
intuitive: initiating all origins in the beginning of S phase, 
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which corresponds to a <5-function I(t), gives rise to the 
broadest distribution. Initiating origins throughout S 
phase narrows the end-time distribution. The narrow- 
ing is more pronounced as the power-law exponent n in- 
creases. These observations can be explained by Eq. 8, 
which states that the width is inversely proportional to 
the typical density of potential origins. The physical in- 
terpretation is that having fewer potential origin sites 
leads to more variation in the spacing between potential 
origins. This in turn induces fluctuations in the largest 
spacings between initiated origins, which widens the end- 
time distribution. In this light, Fig. 2(a) shows that when 
t* is fixed, the S- function case uses the fewest potential 
origins and thus produces the widest distribution. In 
contrast, a large power-law exponent n implies the use of 
many potential origins and thus produces a narrow dis- 
tribution. In summary, the first heuristic principle is that 
the end-time distribution can be narrowed arbitrarily by 
increasing the number of potential origins in the system. 

The second principle is that given an excess of potential 
origins, cells can initiate origins progressively throughout 
S phase instead of all at once, lowering the consumption 
of resources while still controlling the typical replication 
time. In S phase, initiation factors and polymerases are 
recyclable proteins; i.e., they can be reused once they 
are liberated from the DNA [48]. Progressive initiation 
then allows a copy of the replicative protein to be used 
multiple times. Compared with initiating all origins at 
once, this strategy requires fewer copies of replicative pro- 
teins and thus saves resources. This notion of minimizing 
the required replication resources is further discussed in 
Sect. IV. 

Figure 2(b) shows that increasing the exponent n re- 
sults in the "holding back" of more and more initiations 
until later in S phase. Comparing this with Fig. 2(a), 
one finds that holding back initiations corresponds to 
narrowing the end-time distribution. Although many po- 
tential origins are passively replicated and thus never ini- 
tiate, the timing of replication can still be accurately con- 
trolled, as initiations now occur in the "needed places." 
Since the probability of initiation inside a hole is propor- 
tional to the size of the hole, the held-back initiations are 
more likely to occur in large holes. This filling mechanism 
is made efficient by increasing I(t) toward the end of S 
phase so that any remaining large holes are increasingly 
likely to be covered. 

One subtle point of the origin-redundancy scenario is 
that although the potential origins arc licensed at ran- 
dom, the spacings between initiated origins form a dis- 
tribution pi (s) with a non-zero mode that contrasts with 
the exponential distribution of spacings between poten- 
tial origins. An example of the pi(s) is shown later in 
Sec. V. In earlier literature, before experiments showed 
that initiations can take place throughout S phase, bi- 
ologists believed that all potential origins initiate at the 
start of S phase. In this 5-function case, the distribution 
of the inter-potential-origin spacing is the same as that of 
the spacing between fired origins (inter-origin spacing). 



However, with an increasing I(t), a peak will arise in pi(s) 
because closely spaced potential origins are not likely to 
all initiate but be passively replicated by a nearby initia- 
tion. This passive replication effect suppresses the likeli- 
hood of having small inter-origin spacings and thus cre- 
ates a non-zero mode in the spacing distribution. One 
should be careful not to confuse the two distributions. 

In conclusion, we have shown that a large excess of po- 
tential origins suppresses fluctuations in the size of inter- 
potential-origin gaps while the strategy of holding back 
initiations allows control of the typical replication time. 
In the next section, we review what is known experimen- 
tally about DNA replication in Xenopus embryos, in light 
of the analysis we have just presented. 



III. ANALYSIS OF REPLICATION 
EXPERIMENTS 

In the previous section, we showed that given an initia- 
tion function and a fork velocity, one can find the associ- 
ated end-time distribution using EVT. In this section, we 
review what is known experimentally about these quan- 
tities in Xenopus embryos. There have been two classes 
of experiments: in vivo, where limited work has been 
done [4, 20, 21], and in vitro, where rather more de- 
tailed studies have been performed on cell-free extracts 
[9, 10, 11, 18]. Typically, embryo replication in vivo takes 
about 20 minutes of the (abbreviated) 25-minute cell cy- 
cle [16, 19]. As we discuss below, in vivo experiments 
imply that replication "failure" — incomplete replication 
by the end of the cell cycle — is very unlikely, occurring 
only once in about 300 instances. The in vitro exper- 
iments on cell-free extracts give more detailed informa- 
tion about the replication process, including an estimate 
of the in vitro initiation function I V itro(t). However, the 
typical replication time in vitro is about 38 min, not 20 
min, and it is not obvious how one can apply the results 
learned from the in vitro experiments to the living sys- 
tem. Below, we propose a way to transform I V itro(t) into 
an estimate of the in vivo initiation function I V i VO (t) that 
satisfies the failure probability of the in vivo system. 



A. The in vivo experiments 

A low replication-failure rate is remarkable because 
Xenopus embryos lack an efficient S/M checkpoint to de- 
lay cell cycle progression when replication is incomplete 
[16]. If chromosomes separate before replication is com- 
plete, cells suffer "mitotic catastrophe," which leads to 
apoptosis [ ]. Thus, a low failure rate in embryonic cells 
implies that replication timing is precisely controlled by 
the initiation function and fork velocity. Mathematically, 
we can test whether an initiation function is realistic by 
calculating the rate of mitotic catastrophe F it implies. 
To evaluate F, we first choose a time t** at which mitotic 
catastrophe occurs if replication is not fully completed. 
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Then, 

F = J <f> e (t)dt = l-* e (0 • (11) 

As a first step in estimating F, we identify t** with 
the cell cycle time (as 25 min) [19]. Our identification is 
justified by observations that imply that replication can 
continue throughout mitosis, if needed [20]. Thus, even if 
the bulk of replication is completed before entering mito- 
sis, small parts of the genome may continue to replicate, 
essentially until the cell totally divides. However, if while 
the cell is dividing, unreplicated regions of the chromo- 
some segregate, mitotic catastrophe would cause the two 
daughter cells to inherit fragmented chromosomes. 

Having identified t** , we estimate F using data from 
an experiment on DNA damage in embryos [4, 21]. In [4], 
Hensey and Gautier found that cells with massive DNA 
damage (induced by radiation) will continue to divide 
through 10 generations. Then, at the onset of gastrula- 
tion, which occurs between the 10 th and 11 th cleavages, 
an embryo triggers a developmental checkpoint that ac- 
tivates programmed cell death. The role of cell death 
is to eliminate abnormal cells before entering the next 
phase of development, where the the embryo's morphol- 
ogy is constructed via cell migration. In Hensey and Gau- 
thier's study, abnormal cells were detected using TUNEL 
staining, a technique for detecting DNA fragmentation in 
cells. In a later work investigating the spatial-temporal 
distribution of cell deaths in Xenopus embryos, they re- 
ported that, at gastrulation, 67% of 237 embryos, each 
containing 1024 cells, had more than 5 TUNEL-stained 
cells [21]. We can estimate F from the above observa- 
tions using a simple model based on the following four 
elements: 

1. All cells divide; each produces two cells. 

2. If a cell has an abnormal chromosome, all its 
progeny are abnormal because replication can at 
best duplicate the parent's chromosome. 

3. Failure to replicate all DNA before the end of a cell 
cycle is the main cause of abnormal chromosomes 
and leads to apoptosis at gastrulation. 

4. All normal cells in all rounds of cleavage have the 
same probability F of becoming abnormal because 
of incomplete replication. 

A schematic depiction of our model is shown in Fig. 3(a). 

The above model can be described by a standard 
Galton- Watson (GW) branching process [49], where the 
number of proliferating progeny generated by a normal 
cell is an independent and identically-distributed random 
variable. GW processes obey recursion relations that can 
be solved analytically using probability generating func- 
tions; however, the solution in our case is too complex to 
be helpful. We thus turned to numerical analysis. 

We used Monte Carlo methods to simulate the branch- 
ing process outlined above. Each embryo, after going 
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FIG. 3: (a) Schematic diagram of the simple model described 
in the text. Open circles represent normal proliferating cells, 
while filled circles are abnormal cells. The numbers indicate 
the round of cleavage. Once a cell fails to replicate properly, 
all its progeny will be abnormal, (b) Cumulative distribution 
of the number of dead cells at gastrulation (between cleavage 
10 and 11) generated using Monte Carlo simulation. The dis- 
tribution satisfies the constraint that 33% of the embryos have 
5 or fewer abnormal cells. The inset shows the convergence of 
the gradient search to F = 3.73 ± 0.01 x 10~ 3 . The average 
and standard deviation of the mean are computed over the 
last 40 values. 



through 10 rounds of division, contains m abnormal cells 
that commit apoptosis before the 11 th division. Simu- 
lating N embryos results in a distribution of number of 
deaths. We then compare the evaluation of the cumu- 
lative distribution at 5 death events with the reported 
likelihood, which states that 33% of the time, there are 5 
or fewer dead cells in 1024 cells [21]. Figure 3(b) shows 
the cumulative distribution that matches the reported 
numbers. To find F, we used a gradient-based method 
for finding roots of stochastic functions. In this case, the 
input is the failure rate F, and the function evaluates the 
number and likelihood of deaths via a Monte Carlo simu- 
lation of the branching process of 237 embryos. We found 
that the numbers reported in [21] imply F — 3.73 ± 0.01 
[Fig. 3(b) inset] [50, 51]. In summary, we inferred the 
failure rate in Xenopus embryo replication to be about 1 
in 300. 

Comparing Eq. 1 1 with the standard cumulative Gum- 
bel distribution given by the integral of Eq. 4, one can 
relate the quantities t** and F to the Gumbel parameters 
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t** =t* - j3{t*)\n 



In 



1 - F 



(12) 



For F 1 <§; 1, the expression simplifies to t** t* — 
fl(t*) ln(F'), which implies that the end-time is insensitive 
to the exact value of F: an order-of-magnitude estimate 
suffices. 



B. Connecting in vitro to in vivo 

As discussed above, the most detailed experiments 
on replication in Xenopus have been conducted on cell- 
free egg extracts. In previous work [11], we modeled a 
molecular-combing experiment on such an in vitro sys- 
tem and inferred the time-dependent initiation function 
Ivitro(t) (approximately quadratic [44, 45]), a fork ve- 
locity of 0.6 kb/min (averaged over S phase [33]), and 
a typical replication time t* of 38 min. In contrast, the 
typical replication time in living embryos is only 20 min. 
While it is generally believed that DNA replication in 
the two settings occurs in a similar way, the overall du- 
ration of S phase is an obvious difference that must be 
reconciled. We thus have a dilemma: the known replica- 
tion parameters, v and I(t), are extracted from in vitro 
experiments while the failure rate F is derived from ob- 
servations of cells in vivo. Is it possible to "transpose" 
the results from the in vitro experiments to the in vivo 
setting? Although any such transformation is obviously 
speculative, we propose here a simple way that is consis- 
tent with known experimental results. 

We hypothesize that, except for the fork velocity, repli- 
cation is unaltered between the in vitro and in vivo sys- 
tems. The subtlety is that there are several conceivable 
interpretations of "unaltered" replication. One could 
keep Ivitroif) the same; however, this is not reasonable 
in that the dramatic increase in Ivitro{^)i i ~ IT. 4 
min, would be moved from the midpoint of replication to 
the end [11]. Alternatively, one could express the initi- 
ation function in terms of the fraction of replication, ie. 
/ = /(/), and preserve this function. In this case, one 
would need a fork velocity of about 2.2 kb/min to pro- 
duce the extracted in vivo failure rate. Although this is a 
reasonable fork speed in systems such as the Drosophila 
embryo, it is about twice the maximum fork speed ob- 
served in Xenopus embryonic replication in vitro [33]. 
The third possibility is to preserve the maximum num- 
ber of simultaneously active replication forks. Intuitively, 
this is plausible as each replication fork implies the exis- 
tence of a large set of associated proteins. The maximum 
fork density then gives the minimum number of copies of 
each protein set required. Thus, we are in effect assum- 
ing that the numbers of replicative proteins remains the 
same in both cases. 

The simplest way to preserve fork usage is to rescale 
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FIG. 4: Density of simultaneously active replication forks 
throughout S phase, n/(t). The dotted curve corresponds 
to the in vitro fork usage while the solid curve is the 
rescaled fork usage that satisfies the constraints t** = 25 
min and F = 0.00373. The rescaled n/{t) is generated us- 
ing I v i vo (t/t* ivo ) « 2I vit ro{t/tlitro) and v = 1.030 kb/min. 
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where t* vitro w 38 min and t sca ie is chosen so that t** = 25 
min and F = 3.73 ± 0.01 x 10~ 3 . We found that the 
in vitro fork usage is preserved by using the rescaling 
Ivivo{t/t sca i e ) ra 2I mtro (t/t* mtro ) and v = 1.030 ± 0.001 
kb/min (Fig. 4). The error on v is the consequence of 
the uncertainty in F. This velocity has a significant in- 
terpretation. In a recent experiment, Marheineke and 
Hyrien found that the fork velocity in vitro is not con- 
stant but decreases linearly from about 1.1 kb/min to 0.3 
kb/min at the end of S phase [33]. The decrease in fork 
velocity suggests that in vitro replication progressively 
depletes rate- limiting factors (e.g., dNTP) throughout S 
phase. We suggest that our extracted v « 1 kb/min 
means that in vivo systems are able to maintain the con- 
centration of rate-limiting factors, perhaps by regulating 
their transport across the nuclear membrane [52, 53], to 
maintain a roughly constant fork velocity throughout S 
phase. In summary, by preserving the rescaled version of 
the in vitro fork usage rate, we have transformed I V itro(t) 
into an I V i VO (t) that results in reasonable replication pa- 
rameters and reproduces the in vivo failure rate. 



IV. OPTIMIZING FORK ACTIVITY 

The random-completion problem mentioned in Sec. I 
can be quantitatively recast into a problem of searching 
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for an initiation function that produces the in vivo failure 
rate constraint in Eq. 11. In Fig. 5(a), we show that any 
initiation form with the proper prefactor can satisfy the 
constraint on the integral of the end-time distribution, in- 
cluding the transformed in vivo initiation function. Can 
we then understand why Xenopus embryos adopt the 
roughly quadratic I(t) and not some other function of 
time? 

To explore this question, we calculate for the different 
cases of I(t) the maximum number of simultaneously ac- 
tive forks. Figure 5(b) shows that initiating all origins at 
the start of S phase [setting I(t) ~ 6(t)] requires a higher 
maximum than a modestly increasing I(t). At the other 
extreme, a too rapidly increasing I(t) (high exponent n) 
also requires many copies of replicative machinery be- 
cause the bulk of replication is delayed and needs many 
forks close to the end of S phase to finish the replication 
on time. Thus, intuitively, one expects that an interme- 
diate I(t) that increases throughout S phase — but not 
too much — would minimize the use of replicative pro- 
teins. Figure 5(b) hints that the in vivo initiation func- 
tion derived from in vitro experiments may be close to 
such an optimal I(t), as the number of resources required 
by I V ivo(t) is close to the minimum of the power-law case. 

It is not immediately clear which replication resources 
should be optimized. Here, we propose that the max- 
imum number of simultaneously active forks be mini- 
mized. Above, we argued that the maximum of n/(t) 
gives the minimum number of copies of the proteins re- 
quired for DNA synthesis. Moreover, since the unwinding 
and synthesis of DNA at the forks create torsional stress 
on the chromosomes, minimizing the number of active 
forks would minimize the complexity of the chromosome 
topology, which may help maintain replication fidelity 
[54]. For these reasons, the maximum number of active 
forks is a plausible limiting factor that causes replica- 
tion to proceed the way it does. Below, we calculate the 
optimal I(t) and compare it with I V i VO (t). 

The number of forks active at time t is given by n/ (i) = 
2g(t) exp[— 2vh(t)]. One can find the I(t) that optimizes 
the maximum of nf{t) by minimizing 



n m ax[I(t)} = lim 

p — >oo 



t** 



"I 1/p 



(15) 



This is a common analytic method to optimize the max- 
imum of a function [55]. The trick is to analytically 
calculate the Euler-Lagrange equations for finite p and 
then take the limit p — > oo, where the contribution of 
the maximum dominates the integrand. The associated 
Euler-Lagrange equation is 



h(t) = 2vh 2 (t) , 



(16) 



where we recall that h(t) = I(t) and h(t) — g(t). Note 
that Eq. 16 is independent of p, suggesting that the op- 
timal rif(t) does not have a peak. Solving Eq. 16 subject 
to the boundary condition that the replication fraction 
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FIG. 5: (Color online) (a) Replication end-time distribution 
with i** fixed to be 25 min and F — 0.00373. Similar to 
Fig. 2(a), the width decreases with an increase in the exponent 
n. (b) Typical maximum number of simultaneously active 
forks. The curve is obtained from extracting the maximum 
value of n/(t) for different exponents n. 



be at t = [ie., h(0) = 0] and 1 at t = t**, we obtain 



lopt (t) 



1 

2vt* 



5(t) 



1 1 

¥* (i-t/t**) 2 



(17) 



Inserting the result from Eq. 17 into Eq. 13, one sees that 
nf{t) = l/vt** indeed is constant throughout S phase 
and is about three times smaller than the maximum num- 
ber of simultaneously active forks in vivo [Fig. 6(c)]. This 
optimal solution, like I V i VO (t), increases slowly at first, 
then grows rapidly toward the end of S phase [Fig. 6(b)]. 
However, this initiation function is unphysical, as the 
diverging initiation probability at t — > t** implies an in- 
finite number of initiations at the end of S phase. In 
effect, a constant fork density implies that when the pro- 
tein complexes associated with two coalescing forks are 
liberated, they instantly find and attach to unreplicated 
parts of the chromosome. It also implies that at the end 
of S phase, all the replication forks would be active on 
a vanishingly small length of unreplicated genome. Both 
implications arc unrealistic. 

To find a more realistic solution, we tamed the be- 
havior of the initiation rate for t — > t** by adding a 
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constraint. A natural constraint to impose is that the 
failure rate in vivo be satisfied. The infinite initiations 
at t = t** implied by Eq. 17 means that the replication 
always finishes exactly at t** and the failure rate is zero. 
Therefore, having a non-zero failure rate would force the 
number of initiations to be finite. This constraint is also 
consistent with the idea that the replication process is 
shaped by the evolutionary pressure of survival. The 
new optimization quantity is then 

J [/(*)] = max {n f [/(*)]} + \ {F [1(f)] - F mvo } , (18) 

where the first term is the maximum of the fork density, 
and the second term is a penalty function that increases 
J for F ^ F vivo . The strength of the penalty is set by 
the Lagrange multiplier A. The time associated with F 
is t** — 25 min throughout this section. 

Substituting Eq. 15 into the first term of Eq. 18 and 
applying the method of variational calculus, we obtained 
an integro-differential equation that turns out to be stiff 
mathematically and thus difficult to solve. The diffi- 
culty in analytic methods is that the gradient of Eq. 15 
is highly nonlinear and that F depends on t* , which is 
not readily expressible in terms of the basic replication 
parameters I(t) and v. For these reasons, we turned to 
a gradient-free numerical method called finite difference 
stochastic approximation (FDSA) [51]. Although this 
search method is used for stochastic functions (as the 
name suggests), the method is just as suitable for deter- 
ministic functions. The basic concept is that the gra- 
dient of a function, which encodes the steepest-decent 
direction toward a local minimum, can be approximated 
by a finite difference of the function. The advantage of 
this method is that we can replace the complicated eval- 
uation of the variation 6J[I(t)] by the easily calculable 
difference J[I + AI] - J[I - AI] . 

Figure G shows the results of the FDSA search. We per- 
formed FDSA under several different conditions, with the 
initial search function being I V i VO (t). First, we investigate 
the case where the optimization objective J is simply 
max{nf}, with no further constraint or boundary condi- 
tion [except 71/ (f) > 0]. The markers in Fig. 6(a) shows 
that the optimal solution lingers near max{nj} = 0.05 
(slow decrease in J) and then goes to the global minimum 
(zero). In the transient regime (search step between 50 
to 100), the fork density evolves from a bell curve to 
a constant, which is the form of the calculated optimal 
solution. For search step > 100, the fork density (a con- 
stant) decreases to zero if no constraint is imposed. This 
zero solution corresponds to the case where no initia- 
tion or replication occurs. However, when the boundary 
condition used in the calculation (replication finished at 
t**) is imposed, the FDSA algorithm indeed finds the 
7i/ (f) — 1/vt** optimal solution (data not shown). 

The second search was implemented following Eq. 18, 
where the constraint in F is added. Figure 6(c) shows 
that the fork solution is no longer a constant because the 
tail needs to decrease to satisfy F = F vivo . The cor- 
responding effect on the I(t) is a decrease toward the 
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FIG. 6: (Color online) Results of a numerical search for op- 
timal initiation functions under various constraints. The la- 
bel "vivo" corresponds to the in vivo case; "optimal" cor- 
responds to optimizing maximum fork density with no con- 
straint (corresponds to Eq. 17); ll F V ivo" corresponds to op- 
timization with the constraint that the failure rate be equal 
the F v i vo extracted in Sec. Ill A; u F V ivo+g(0y corresponds to 
optimization with the constraint of Fvivo and the constraint 
that g(0) = 0. (a) Finite difference stochastic approximation 
search. The markers shows the search for the case of min- 
imizing the max{nf} with no constraint and no boundary 
condition. The horizontal lines are the maximum fork den- 
sity for different search conditions, (b) Initiation rate I(t). 
The Ivivo shown is in the bi-linear form [44] . (c) Fork density 
nf(t). Line types correspond to the same cases as in (b). 



end of S phase [Fig. 6(b)]. Interestingly, the mechanism 
of spreading out the fork density to minimize the maxi- 
mum fork usage seen in the analytical calculation is still 
present here, as shown by the plateau at the beginning. 
The J(t) then behaves as predicted by Eq. 17 for most 
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of S phase — a <5-function at the beginning followed by 
a rate that increases sharply at the end of S phase. 

In the third search, in addition to Eq. 18, we imposed 
that there be no burst of initiation at the beginning of 
S phase [g(0) = 0], as seen in experiments. Figure 6(c) 
shows that with the addition of each constraint, the max- 
imum of the fork density increases toward the in vivo 
value. Further, besides satisfying the constraints and 
boundary conditions, the fork density profiles show a 
common feature of forming as lengthy a plateau as possi- 
ble to minimize the maximum. The resulting I(t) is very 
similar to I V i VO [Fig. 6(b)]. 

However, there are still some differences between the 
result of the third search and n v ™° . In particular, the op- 
timal fork solution increases much faster at the beginning 
of S phase than n v ™° does, to spread out the fork activi- 
ties. Minimizing the maximum number of initiations also 
leads to the same feature of a fast initial increase in the 
initiation activities followed by a plateau. These observa- 
tions then suggest that while minimizing the maximum of 
simultaneously active replicative proteins may be a fac- 
tor that determines the replication pattern, there must a 
stronger limiting factor at the beginning of S phase. A 
plausible hypothesis is that the copy number of at least 
some of the replicative proteins is small to start with 
but gradually increases with nuclear import throughout 
S phase [56]. With this additional constraint, the repli- 
cation activities at the beginning of S phase are limited 
by the small numbers of available replicative proteins. 
In conclusion, the optimization method presented in this 
section is useful because it connects the replication pro- 
cess with an optimal criterion that is plausibly connected 
with evolutionary selection pressure. This allows one to 
explore the limiting factors of replication. Moreover, the 
method is general in that it allows one to explore the 
consequences of a wide range of possible constraints. 



V. THE LATTICE-GENOME MODEL: FROM 
RANDOM TO PERIODIC LICENSING 

Until now, we have assumed a spatially random dis- 
tribution of potential origins. In this section, we explore 
the implications of spatial ordering among the potential 
origins on the end-time distribution. We have two moti- 
vations. First, an "obvious" method for obtaining a nar- 
row end-time distribution is to space the potential origins 
periodically and initiate them all at once. However, such 
an arrangement would not be robust, as the failure of 
just one origin to initiate would double the replication 
time. Still, the situation is less clear if initiations are 
spread out in time, as the role of spatial regularity in 
controlling inter-origin spacing would be blurred by the 
temporal randomness. 

Our second motivation is that there is experimental 
evidence that origins axe not positioned completely at 
random. A completely random positioning implies that 
the distribution of gaps between potential origins is ex- 



ponential, resulting in many small inter-potential-origin 
spacings. However, in an experiment of plasmid replica- 
tion in Xenopus egg extracts, Lucas et at found no inter- 
origin gap smaller than 2 kb [25]. In a previous analysis, 
we also observed that, assuming random licensing, one 
expects more inter-origin gaps less than 8 kb than were 
observed and fewer between 8-16 kb [14]. Second, exper- 
iments have suggested a qualitative tendency for origins 
to fire in groups, or clusters [18]. These findings collec- 
tively imply that there is some spatial regularity in the 
Xenopus system, perhaps through a "lateral inhibition" 
of licensing potential origins too closely together. Our 
goal is to find an "ordering threshold," at which point 
the resulting end-time distribution starts to deviate from 
the random- licensing case. We will then argue that one 
needs not worry about effects of regular spacing in the 
Xenopus system because the experimental degree of or- 
dering is well below this threshold. 

To investigate spatial ordering, we change the contin- 
uous genome to a "lattice genome" with variable lattice 
spacing d\. Potential origins can be licensed only on the 
lattice sites. For di — > 0, the lattice genome becomes 
continuous, and the model recovers the random-licensing 
case. As di increases, the lattice genome has fewer avail- 
able sites for licensing potential origins, and the fraction 
of licensed sites increases. In this scenario, the spacing 
between initiated origins take on discrete values — mul- 
tiples of di . One can imagine that a further increase in di 
would eventually lead to a critical di , where every lattice 
site would have a potential origin. This scenario corre- 
sponds to an array of periodically licensed origins, which 
leads to a periodic array of initiated origins with spac- 
ing di. Thus, by increasing a single parameter di, we 
can continuously interpolate from complete randomness 
to perfect periodicity. 

In order to compare regularized licensing to random 
licensing, we impose that the cell environment, including 
the concentrations of replicative proteins such as CDK, 
helicases, and polymerases, be the same in spite of vary- 
ing di. This constraint implies that while the potential 
origins may be distributed along the genome differently, 
the total initiation probability across the genome is con- 
served. We then write 

L/di 

I(x,t) = dil(t)^26(x-ndi) , (19) 

n=0 

where x is the position along the genome. This equation 
shows that as the number of lattice sites L/di is reduced 
via an increase in di, the initiation probability for each 
site is enhanced, resulting in more efficient potential ori- 
gins. This implies a tradeoff between the "quantity" and 
"efficiency" of potential origins. 

Figure 7 illustrates this concept of tradeoff, showing 
how Eq. 19 connects random licensing to ordered li- 
censing. A realization of random licensing is shown in 
Fig. 7(a). Since Eq. 19 modifies only the spatial distri- 
bution of origins relative to our previous I(t) 7 the effect 
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FIG. 7: Schematic diagram of licensing on a lattice genome, 
(a) Realization of replication using random-licensing (d; = 
case). The gray (white) area represents replicated (unrepli- 
cated) domains. Circles denote initiations, (b) Origins are 
forced to their nearest lattice sites (marked by vertical lines 
at multiples of di — 200 kb), while initiation times remain 
the same, (c) The result of the shift in origin positions. Open 
markers represent "phantom origins" that do not contribute 
to the replication; filled markers denote the actual origins. 
Going from di — kb in (a) to 200 kb in (c), the average 
initiation time decreased from about 22 min to about 10 min. 



of going from a continuous genome to a lattice genome 
is equivalent to shifting the randomly licensed origins to 
their nearest lattice sites while preserving their initiation 
times [Fig. 7(b)]. In doing so, we obtained Fig. 7(c), 
which shows multiple initiations on a lattice site. Since 
re-initiation is forbidden in normal replication, on each 
site only the earliest initiation contributes to the repli- 
cation. The later initiations are "phantom origins" that 
illustrate how ordering reduces the number of initiations 
but enhances the efficiency of potential origin sites. The 
increase in efficiency is indicated by the decrease in the 
average initiation times between the two scenarios. 

Having outlined the rules for licensing, we now intro- 
duce two quantities, "periodicity" P and di n t er , that will 
be useful in later discussions of how d; alters the end- 
time distribution. We first look at Pi(s), the distribution 
of the spacing between initiated origins, where s is the 
inter-origin spacing. Figure 8(a) shows two /?i(s)'s: the 
continuous one corresponds to random licensing, while 
the discrete one corresponds to setting di to 2 kb. The 
two distributions are different because of the discretiza- 
tion effect of the lattice genome: origins can have sepa- 
rations that are only multiples of di . As di increases, one 
expects a dominant spacing to appear in the system. We 
characterize this ordering effect by defining the periodic- 
ity P, the probability at the mode of the discrete inter- 
origin-spacing distribution. As an example, the di = 2 
kb distribution shown in Fig. 8(a) has P — 0.23, indicat- 
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FIG. 8: (a) The distribution of spacings between initiated 
origins, Pi(s), for the di = and 2 kb cases (2 kb is chosen to 
mimic the minimal spacing between origins reported in [25]). 
The initiation rate and fork velocity are those obtained in 
Sec. IIIB. The mean of the continuous distribution (di = 
kb case) is marked di n t er and is m 6.5 kb. The mode of the 
discrete distribution (d; = 2 kb case) is marked by " * ." The 
probability P at the mode (0.23 in this case) is defined to 
be the periodicity, a measure of ordering in the system, (b) 
Average inter-origin spacing d ava as a function of di . There is 
a gradual transition from Regime I to Regime II. In Regime 
I (di < di n ter), d avg is asymptotically independent of di for 
di — > 0. In Regime II (d; > d inter ), d avg is asymptotically 
linearly proportional to di. Inset shows the periodicity P as 
a function of di. 



ing that 23% of the spacings have the same value. In the 
fully periodic case, the probability at the mode is 1, as 
all the spacings have the same value: the system is then 
100% periodic (P = 1). For di — > 0, P should be inter- 
preted as the mode of Pi(s) times a vanishingly small As 
(~ di). Thus, P — ■> in the small As limit, as there will 
be no inter-origin spacings sharing the same size. 

In interpolating from random licensing to periodic li- 
censing, one expects that the average inter-origin spacing 
d aV g would change from being (^-independent to being 
linearly dependent on d/. Indeed, from Fig. 8(b), which 
shows d aV g as a function of di, we can label two asymp- 
totes and thereby identify two regimes. We first intro- 
duce di n ter to be the average inter-origin spacing of the 



kb case. For di — » 0, we see that d aV g asymptoti- 
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FIG. 9: (Color online) Evolution of the end-time distribution 
with increasing spatial ordering due to increasing d;. Each 
horizontal profile is an end-time distribution. In Regime I, the 
end-time distribution does not change appreciably; in Regime 
II, the mode shifts to the right. The ordering threshold is at 
di — dinter ~ 6.5 kb. The dashed line shows the di = 2 
kb end-time distribution, which corresponds to the lateral 
inhibition ordering observed experimentally [25]. 



cally approaches di nte r- In contrast, for large di (when all 
lattice sites are occupied), we see d avg approaching the 
asymptote d avg — di . The intersection of the two asymp- 
totes is precisely at di = dinter- We therefore identify 
two regimes with Regime I being d; < di nter & n( i Regime 
II being di > di nter - Physically, the weak di dependence 
in Regime I suggests that the system is spatially random, 
whereas the asymptotically linear behavior in Regime II 
indicates that the system is becoming periodic. 

The length scale d inter encodes the two factors that 
determine the distribution of inter-origin spacings. The 
first factor is the passive replication of closely positioned 
potential origins, which suppresses the likelihood of hav- 
ing small inter-origin spacings. The second factor is 
based on the low probability of randomly licensing two 
far-away origins, which reduces the probability of having 
large inter-origin gaps. Both of these effects can be seen 
in Fig. 8(a). 

When di exceeds dinter, the typical spacing between 
potential origins (~ di) exceeds the typical range of pas- 
sive replication and approaches the typical largest spac- 
ing of the random-licensing case. This means that poten- 
tial origins are not likely to be passively replicated or po- 
sitioned farther than di apart (note that the next smallest 
spacing 2d; is quite large). The inset in Fig. 8(b), which 
shows the periodicity P as a function of di, strengthens 
this notion that for di > di nter , the system enters a nearly 
periodic regime where P has saturated. 

Our main result is Fig. 9, which shows how the end- 
time distribution changes with increasing di. The ini- 
tiation function used in the simulation is the power- 
law approximation of the I V i VO (t) found in Sect. IIIB, 



transformed using Eq. 19. The fork velocity and failure 
rate used are as extracted in Sect. III. There are again 
two distinct regimes separated by the ordering threshold 
dinter ~ 6.5 kb. Below the threshold (Regime I), the 
end-time distribution is nearly independent of di . Above 
the threshold (Regime II), the mode shifts to the right. 
The width is unaltered. 

To understand the changes in going from Regime I 
to Regime II, we note that in Eq. 5, t* depends on the 
number of initiations N a . On average, N a is unaffected 
when the number of lattice sites available is in excess 
(i7§7 > ■"•)• This means that t* starts to change only 
when d\ = L/N a which is precisely d inter . In Regime II, 
the minimum time to replicate the smallest gap between 
potential origins, di/v, becomes significant compared to 
the temporal randomness resulting from stochastic initi- 
ation. In effect, t* sa di/v+t avg , where t avg is the average 
initiation time. We tested numerically that the mean and 
standard deviation of the initiation times both decrease 
sigmoidally, for di/ dinter > 3. Thus, for the range of di 
shown in Fig. 9, one expects t* oc di in Regime II, while 
the width should be unaltered. 

In Xenopus embryos, the inhibition zone observed in 
plasmid replication corresponds to di ~ 2 kb. This is 
shown as the dashed line in Fig. 9. The value is well 
below the ordering threshold of d in ter ~ 6.5 kb, strongly 
suggesting that the experimentally observed spatial or- 
dering is not strong enough to be relevant to the random- 
completion problem in embryonic replication. Recall 
that biologists have proposed two models for solving the 
random-completion problem, the regular-spacing model 
and the origin-redundancy model. Our results suggest 
that to employ the regular-spacing strategy, a cell would 
need mechanisms for measuring the spacing between ori- 
gins and also mechanisms that ensure the early initiation 
of each potential origin. Such mechanisms have not been 
found to date. On the other hand, the cell can solve the 
problem by using the origin-redundancy strategy, where 
more potential origins are licensed than needed. Cells can 
then accurately control the replication time by increas- 
ing the initiation rate, perhaps by importing the required 
proteins into the cell nucleus. Both mechanisms in the 
latter strategy are "open loop" in that they do not re- 
quire any information about the replication state of the 
cell. 



VI. CONCLUSION 

In this paper, we have extended the stochastic 
nucleation-and-growth model of DNA replication to de- 
scribe not only the kinetics of the bulk of replication but 
also the statistics of replication quantities at the end of 
replication. Using the model, we have quantitatively ad- 
dressed the random-completion problem, which asks how 
stochastic licensing and initiation lead to the tight con- 
trol of replication end-times observed in systems such as 
Xenopus embryos. We found that the fluctuation of the 
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end-times can be suppressed by increasing the number of 
potential origins, while the typical end-time can be ad- 
justed by holding back initiations until later in S phase. 
Further, we analyzed the effect of spatial ordering on the 
replication end-time using a lattice-genome. Our results 
show that the end-time distribution is not affected until 
an ordering threshold is reached. We also showed that 
the observed ordering effect of lateral inhibition in Xeno- 
pus is well below the ordering threshold. 

Comparing the two proposed solutions to the random- 
completion problem, the origin-redundancy model and 
the regular-spacing model, our results strongly suggest 
that embryonic replication takes the origin-redundancy 
approach. In fact, given that initiations are stochastic 
in time, the regular-spacing model does not improve the 
end-time control much even when potential origins are 
periodically positioned. This suggests that our stochas- 
tic model describes embryonic replication well and that 
spatial ordering mechanisms play a minor role in regu- 
lating replication times. 

We have also found the optimal I{t) that minimizes 
the maximum number of simultaneously active forks. 
Like the observed in vitro initiation function, it increases 
throughout S phase except for the end. Further pursuit 
of the optimization problem with more detailed model 
may reveal the rate-limiting factors in replication, which 
have not been identified to date. Further, an open issue 
not addressed by our model is the observation that there 
is a weak clustering effect in the initiations of neighbor- 
ing origins [18]. To model this effect, one can introduce 
correlations in licensing, initiation, and fork progression 
based on localization of replication foci [ ] , chromatin 
structures [58], or some other mechanisms. We do not 
expect that correlations will modify the scenario we have 
presented here significantly, as the most significant effect 
of correlations, an increase in spatial ordering, would not 
be important even at exclusion-zone sizes that are much 
larger than observed (e.g., 10 kb). 

Among the various cases of replication programs, repli- 
cation in bacteria is the most well understood — DNA 
synthesis starts at a single, sequence-specific genome site 



and proceeds to completion [5! ]. With this case, the 
genome-wide regulation of the replication process is de- 
terministic and strictly governed by biochemical effects. 
In this work, we modeled a very different type of repli- 
cation program, where both the licensing and initiation 
timings are strongly influenced by stochastic effects. This 
type of stochastic replication strategy is usually present 
in embryos, especially those that develop outside of par- 
ents' body, for rapid development of the embryos. We 
showed how stochastic effects ensure the fast and reli- 
able replication needed for rapid development. 

In between these two special cases lie all other replica- 
tion programs, where the licensing mechanisms are more 
complicated than simple sequence targeting or a Poisson 
process [60]. These include replication in non-embryonic 
(somatic) cells [(if], in simple organisms such as yeast 
[62] , and in cancer cells where abnormal replication such 
as re-replication can occur [5]. For instance, replica- 
tion experiments done on fission-yeast cells also show 
an excess number of potential origins. In that case, the 
positions of the origins are associated with a sequence 
asymmetry of the genome instead of being completely 
sequence independent [63]. The experiments also show 
that different origins have different efficiencies, suggest- 
ing that initiation timings are not entirely stochastic. In 
future work, we shall modify our model to study such sys- 
tems, which may lead to a fuller understanding of how 
replication is regulated by the genome organization and 
by DNA replication strategies that are the outcome of 
evolution. 



Acknowledgments 

We thank Nick Rhind and Michel Gauthier for helpful 
discussions and a thorough reading of the manuscript. 
We thank Olivier Hyrien for informing us of unpublished 
results and data. This work was funded by NSERC 
(Canada) and by HFSP. 



[1] J. Herrick and A. Bensimon, Chromosoma 117, 243 
(2008). 

[2] J. F. Diffley, Genes & Dev. 10, 2819 (1996). 
[3] E. E. Arias and J. C. Walter, Genes & Dev. 21, 497 
(2007). 

[4] C. Hensey and J. Gautier, Mech. Dev. 69, 183 (1997). 
[5] R. D. Micco, M. Fumagalli, A. Cicalese, S. Piccinin, 

P. Gasparini, G Luise, G Schurra, M. Garre, P. G. Nu- 

ciforo, A. Bensimon, et al., Nature 444, 638 (2006). 
[6] A. Sancar, L. A. Lindsey-Boltz, K. Unsal Kagmaz, and 

S. Linn, Annu. Rev. Biochem. 73, 39 (2004). 
[7] A. Bensimon, A. Simon, A. Chiffaudel, V. Croquette, 

F. Heslot, and D. Bensimon, Science 265, 2096 (1994). 
[8] J. Herrick and A. Bensimon, Biochimie 81, 859 (1999). 
[9] J. Herrick, P. Stanislawski, O. Hyrien, and A. Bensimon, 



J. Mol. Biol. 300, 1133 (2000). 
[10] K. Marheineke and O. Hyrien, J. Biol. Chem. 276, 17092 
(2001). 

[11] J. Herrick, S. Jun, J. Bechhoefer, and A. Bensimon, J. 

Mol. Biol. 320, 741 (2002). 
[12] O. Hyrien and M. Mechali, EMBO J. 12, 4511 (1993). 
[13] S. Jun, H. Zhang, and J. Bechhoefer, Phys. Rev. E 71, 

011908 (2005). 

[14] S. Jun and J. Bechhoefer, Phys. Rev. E 71, 011909 
(2005). 

[15] J. Bechhoefer and B. Marshall, Phys. Rev. Lett. 98, 
098105 (2007). 

[16] O. Hyrien, K. Marheineke, and A. Goldar, BioEssays 25, 
116 (2003). 

[17] D. Kimelman, M. Kirschner, and T. Scherson, Cell 48, 



15 



399 (1987). 

[18] J. J. Blow, P. J. Gillespie, D. Francis, and D. A. Jackson, 
J. Cell Biol. 152, 15 (2001). 

[19] The durations of the embryonic cell cycle depends on 
temperature. For this paper, we take the cell cycle time 
to be 25 min at 23° C [ ]. A typical duration of S phase 
used is « 20 min [16, 18]. Longer times (25 min for S 
phase and 30 min for the cell cycle) have been observed 
at 20° C [56]. 

[20] T. A. Prokhorova, K. Mowrer, C. H. Gilbert, and J. C. 

Walter, Proc. Natl. Acad. Sci. 100, 13241 (2003). 
[21] C. Hensey and J. Gautier, Dev. Biol. 203, 36 (1998). 
[22] R. A. Laskey, J. Embryol. Expe. Morphol. Suppl. 89, 285 

(1985). 

[23] J. Walter and J. W. Newport, Science 275, 993 (1997). 

[24] N. Rhind, Nat. Cell Biol. 8, 1313 (2006). 

[25] I. Lucas, M. Chevrier-Miller, J. M. Sogo, and O. Hyrien, 

J. Mol. Biol. 296, 769 (2000). 
[26] A. N. Kolmogorov, Bull. Acad. Sc. USSR, Phys. Ser. 1, 

335 (1937). 

[27] W. A. Johnson and F. L. Mehl, Trans. AIME 135, 416 
(1939). 

[28] M. Avrami, J. Chem. Phys. 7, 1103 (1939); 8, 212 (1940); 
9, 177 (1941). 

[29] K. Sekimoto, Physica A 125, 261 (1984); 135, 2 (1986). 
[30] K. Sekimoto, Int. J. Mod. Phys. B 5, 1843 (1991). 
[31] E. Ben-Nairn and P. L. Krapivsky, Phys. Rev. E 54, 3562 
(1996). 

[32] It is known that when DNA polymerases introduce copy- 
ing errors in the process of synthesizing nucleotides, repli- 
cation forks stall to repair the mispairing [59]. These fork 
stalls occur stochastically in time. Thus, in our model, a 
constant fork velocity is a mean-field approximation of 
the time- varying fork velocities and the fork-to-fork vari- 
ation in progression speed. 

[33] K. Marheineke and O. Hyrien, J. Biol. Chem. 279, 28071 
(2004). To test whether replacing the observed decreasing 
fork velocity with its average is a valid approximation, we 
simulated the two cases with a modified version of the 
phantom-nuclei algorithm discussed in Sec. II. We found 
that the tails of the coalescence distributions for the two 
cases agree to ~ 1%, indicating that the two cases also 
have similar end-time distributions. 

[34] C. Conti, B. Sacca, J. Herrick, C. Lalou, Y. Pommier, 
and A. Bensimon, Mol. Bio. Cell 18, 3059 (2007). 

[35] W. Krauth, Statistical Mechanics: Algorithms and Com- 
putations (Oxford University Press, Oxford, 2006). 

[36] W. H. Press, S. A. Teukolsky, V. William T, and B. P. 
Flannery, Numerical Recipes: The Art of Scientific Com- 
puting (Cambridge University Press, New York, 2007), 
3rd ed. 

[37] Wavemetrics Inc. (2008), URL http://www. 
wavemetrics . com/ . 

[38] E. J. Gumbel, Statistics of Extremes (Columbia Univ. 
Press, New York, 1958). 

[39] S. Kotz and S. Nadarajah, Extreme Value Distributions 
(Imperial College Press, London, 2000). 

[40] R.-D. Reiss and M. Thomas, Statistical Analysis of Ex- 
treme Values (Birkhauser Verlag, Basel, 2001), 2nd ed. 

[41] G. Caldarelli, F. D. Di Tolla, and A. Petri, Phys. Rev. 
Lett. 77, 2503 (1996). 

[42] A. A. Moreira, J. S. Andrade, and L. A. Nunes Amaral, 
Phys. Rev. Lett. 89, 268703 (2002). 

[43] H. Tijms, Understanding Probability: Chance Rules in 



Everyday Life (Cambridge University Press, Cambridge, 
2004). 

[44] The original inference of the in vitro data was fitted us- 
ing a bi-linear function [11]. Using the transformation in 
Sec. Ill to transpose the bi-linear Ivitro, we obtained a 
scaled bi-linear I vivo- For ease of calculation, in parts of 
the paper, we approximate both bi-linear functions with 
power-law functions. Both initiation functions are ap- 
proximately quadratic (Ivitro ~ t 2 ' 62 and I v ivo ~ t 2A5 ). 

[45] Ivitro is approximately quadratic except for a decrease 
[I(t) — > 0] toward the end of S phase [ ]. Since the errors 
on the decreasing part of the data are large, we neglected 
the decrease in our previous work [13, 15]. However, a 
recent repeat of the original experiment has shown that 
the decrease is not an artifact of poor statistics but repre- 
sents a real feature of the replication process [56]. Thus, 
we examined how the decrease in I(t) affects the end- 
time distribution. We found that with the decrease, the 
mode of the distribution was delayed by ~ 0.3% and the 
width increased by 15%. These quantities approximately 
translate into an S phase that is prolonged by « 0.5 min. 
This difference is small compared to the overall duration 
of S phase (20 min). 

[46] J. P. Sethna, Statistical Mechanics: Entropy, Or- 
der Parameters, and Complexity (Oxford Univer- 
sity Press, Oxford, 2006). The derivation of the 
Gumbel distribution appears in a web-supplement, 
URL http: //pages .physics . Cornell . edu/ sethna/ 
StatMech/NewExercises .pdf . 

[47] C. H. Thiebaud and M. Fischberg, Chromosoma 59, 253 
(1977). 

[48] S. P. Bell and A. Dutta, Annu. Rev. Biochem. 71, 333 
(2002). 

[49] M. Kimmel and D. E. Axelrod, Branching Processes in 
Biology (Springer- Verlag, New York, 2002). 

[50] In [15], we estimated a failure rate F w 10~ 4 using a 
simple model that neglected the complications due to the 
branching process. Since t** depends on the logarithm of 
F from Eq. 12, the factor of 30 between the two estimates 
of F results only in a roughly 1 min shift in the t** of the 
end-time distribution and does not alter the qualitative 
conclusion of the previous work. 

[51] J. C. Spall, Introduction to Stochastic Search and Opti- 
mization (J. Wiley, New Jersey, 2003). 

[52] J. D. Moore, J. Yang, R. Truant, and S. Kornbluth, J. 
Cell Biol. 144, 213 (1999). 

[53] V. Q. Nguyen, C. Co, K. Irie, and J. J. Lib, Current Biol. 
10, 195 (2000). 

[54] L. Postow, N. J. Crisona, B. J. Peter, C. D. Hardy, and 

N. R. Cozzarelli, Proc. Natl. Acad. Sci. 98, 8219 (2001). 
[55] S. Skogestad and I. Postlethwaite, Multivariable Feedback 

Control (J. Wiley, Chichester, 2005), 2nd ed. 
[56] O. Hyrien (private communication). 
[57] P. Meister, A. Taddei, A. Ponti, G. Baldacci, and S. M. 

Gasser, EMBO J. 26, 1315 (2007). 
[58] S. Jun, J. Herrick, A. Bensimon, and J. Bechhoefer, Cell 

Cycle 3, 223 (2004). 
[59] H. Lodish, A. Berk, C. A. Kaiser, M. Krieger, M. P. Scott, 

A. Bretscher, H. Ploegh, and P. Matsudairu, Molecular 

Cell Biology (W.H. Freeman and Company, New York, 

2008), 6th ed. 
[60] D. M. Gilbert, Science 294, 96 (2001). 
[61] M. Anglana, F. Apiou, A. Bensimon, and M. Debatisse, 

Cell 114, 385 (2003). 



1G 



[62] P. K. Patel, B. Arcangioli, S. P. Baker, A. Bensimon, and 

N. Rhind, Mol. Biol. Cell 17, 308 (2006). 
[63] C. Heichinger, C. J. Penkett, J. Bahler, and P. Nurse, 

EMBO J. 25, 5171 (2006). 



